The notebook uses scib package as a wrapper for all integrations methods, however it doesn't allow to set method parameters and uses defaults. I strongly recommend to use the methods itself in real life, this notebook is for demonstrational purposes only.

Download and chack the data

The data was taken from here https://figshare.com/articles/dataset/Benchmarking_atlas-level_data_integration_in_single-cell_genomics_-_integration_task_datasets_Immune_and_pancreas_/12420968

Look like something non-log transformed (it is better to compare values in X and "counts" directly)

UMAP without integration

scib

Preprocessing

Cell cycle

according to https://github.com/theislab/scib-reproducibility/tree/main/notebooks/data_preprocessing/pancreas the data is already normalazed and log-transformed so we will skip scib.preprocessing.normalize

HVG

Does per batch HVG selection change integration?

not really

integration

Per batch scaling

Per-batch scaling is a bath correction itself, the paper suggests that it favors batch removal over biological conservation. So, I'll not use it for input of integration methods but will consider it as one of integration methods

Even without integration the umap is much better than before - simply because per-batch scaling and per-batch HGV selection (i think that the former is more important)

bbknn

compare with https://theislab.github.io/scib-reproducibility/dataset_pancreas.html#3_Embeddings

combat

compare with https://theislab.github.io/scib-reproducibility/dataset_pancreas.html#3_Embeddings

Scanorama

compare with https://theislab.github.io/scib-reproducibility/dataset_pancreas.html#3_Embeddings

Metrics

load metrics from paper

Compute all metrics

One by one

ARI

NMI

cLISI

iLISI

cell_cycle

I'm note really understand what is going on here :) I use umap as ebmedding, that is probably not correct (should be pca smth else with more dimentions)